Parsing Idioms in Lexicalized TAGs

نویسندگان

  • Anne Abeillé
  • Yves Schabes
چکیده

We show how idioms can be parsed in lexiealized TAGs. We rely on extensive studies of frozen phrases pursued at L.A.D.L) that show that idioms are pervasive in natural language and obey, generally speaking, the same morphological and syntactical patterns as 'free' structures. By idiom we mean a structure in which some items are lexically frozen and have a semantics that is not compositional. We thus consider idioms of different syntactic categories : NP, S, adverbials, compound prepositions.., in both English and French. In lexicalized TAGs, the same grammar is used for idioms as for 'free' sentences. We assign them regular syntactic structures while representing them semantically as one non-compositional entry. Syntactic transformations and insertion of modifiers may thus apply to them as to any 'free' structures. Unlike previous approaches, their variability becomes the general case and their being totally frozen the exception. Idioms are generally represented by extended elementary trees with 'heads' made out of several items ( that need not be contiguous) with one of the items serving as an index. When an idiomatic tree is selected by this index, lexical items are attached to some nodes in the tree. Idiomatic trees are selected by a single head node however the head value imposes lexical values on other nodes in the tree. This operation of attaching the head item of an idiom and its lexical parts is called l e x i c a l a t t a c h m e n t . The • resulting tree has the lexical items corresponding to the pieces of the idiom already attached to it. *This work is partiMly supported (for the second author) by ARO grant DAA29-84-9-007, DARPA grant N0014-85-K0018, NSF grants MCS-82-191169 and DCR84-10413. We have benefitted immensely from our discussions with Aravind Joshi, Maurice Gross and Mitch Marcus. We want also to thank Kathleen Bishop, and Sharon Cote. 1Laboratoire d 'Automat ique Documentaire et Linguistique, University of Paris 7. We generalize the parsing strategy defined for lexicalized TAG to the case of 'heads' made out of several items. We propose to parse idioms in two steps which are merged in the two steps parsing strategy that is defined for 'free' sentences. The first step performed during the lexical pass selects trees corresponding to the literal and idiomatic interpretation. However it is not always the case that the idiomatic trees are selected as possible candidates. We require that all basic pieces building the minimal idiomatic expression must be present in the input string (with possibly some order constraints). This condition is a necessary condition for the idiomatic reading but of course it is not sufficient. The second step performs the syntax analysis as in the usual case. During the second step, idiomatic reading might be rejected. Idioms are thus parsed as any 'free' sentences. Except during the selection process, idioms do not require any special parsing mechanism. We are also able to account for cases of ambiguity between idiomatic and literal interpretations. Factoring recursion from dependencies in TAGs allows discontinuous constituents to be parsed in an elegant way. We also show how regular 'transformations' are taken into account by the parser. Topics: Pa r s ing , I d io m s . 1 I n t r o d u c t i o n t o T r e e A d j o i n i n g G r a m m a r s Tree Adjoining Grammars (TAGs) were introduced by Joshi et al. 1975 and Joshi 1985 as a formalism for linguistic description. Their linguistic relevance was shown by Kroch and Joshi 1985 and Abeill@ 1988. A lexicalized version of the formalism was presented in Schabes, Abeill~ and Joshi 1988 that makes them attractive for writing computational grammars. They were proved to be

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicalized TAGs, Parsing and Lexicons

In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality (as compared to a context-free grammar) over which constraints can be stated. These constraints either hold within the elementary structure itself or specify what other structures can be composed with a given elementary structure. The 'grammar' consi...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Parsing strategies with 'lexicalized' grammars: application to Tree Adjoining Grammars

In this paper, we present a parsing strategy that arose from the development of an Earley-type parsing algorithm for TAGs (Schabes and Joshi 1988) and from some recent linguistic work in TAGs (Abeillé: 1988a). In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality (as compared to a context-free grammar) ...

متن کامل

Lemmatization and Lexicalized Statistical Parsing of Morphologically-Rich Languages: the Case of French

This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags ar...

متن کامل

A principle-based hierarchical representation of LTAGs

LTAGs Abstract Lexicalized Tree Adjoining Grammars have proved useful for NLP. However, numerous redundancy problems face LTAGs developers, as highlighted by Vijay-Shanker and Schabes (92). We present a compact hierarchical organization of syntactic descriptions, that is linguistically motivated and a tool that automatically generates the tree families of an LTAG. The tool starts from the synta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989